D0CIMBBT usoai 



BO 156 715 

iOTBOB 

TITLE- 



IBSTIT0TIOB 

iEPOBT BO 
PO* DATE 
BOTE 

A? AXLABLE FHOB 



tDBS PBICE 
DBSCBIPTOBS 



ft 007 249 



'1. 



Levy, Paal S.; French, Daight K. > » 
Synthetic Estieation of- State Health Characteristics 
Based on the Health. Inter v i ex Surrey, tital and 
flealth Statistics* 'Series 2. Data' Evaluation and 
Bethods Besearch. Boater 75. V - ~ 

national Center for Health Services Besearch 
; (DHEH/PBS) , Hyattsville, Bd. « 
DHBH-PHS-78-1349 
Oct 77 ,'-•.*/- 

30p. " ^ {„-. ' 

Superintendent of Dccoaents, O.S. Governaent Printing 
Office, Washington, B.C. 20402 <Stock\ Huater - 
260-937:35, St. 20^ . ' \ 

BF-S0. 83 HC -$2.06 Plus Postage. » 
Census Figures; Coaaunity Health? Data Collection; 
Demography; . Error, Patterns; *Gecgraphic Begions;y - , 
Batheaatical Hodels; Baticnal Surveys; Physical ,\." 
Health; *Pu'blic Health; *Beliability; Besearch 
Design; *Saapling; State Surveys; *Statistical \ 
Analysis; Statistical Bias;. *statistical Surveys 
♦Synthetic Bstiaation . 



IDEMTIFIEBS 
iBSTBACT ' 

. . Synthetic estimation is a Statistical technique that\ 

edtiaates saall-area statistics by ccabining national, estiaates of * 
tie relevant characteristics with estiaates of ether kneen 
characteristics of the saall geographic area.. The advantages of the 
synthetic estimation approach to local estiaation are its intuitive * 
appeal, 0 its siaplicity, and its lov cost. 1 aaj'or' disadvantage is its 
poasibile lack of sensitivity to certain $pcal characteristics, 
inother aethod used for the saae purpose is the "nearly unbiased" 
Estimator. The aatheaatics of both aethods are presented, including ~ 
foraulas for estimating aean square ezrer^asd average aean sguare 
error. An evaluation of synthetic estiaates is deaonstrated by 
coaparing the results of a 50-cell grid ait h the results froa 2, 4, / 
8, and 16-cell grids. (Author/CTH) \ 
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This report discusses the various methods that have been p jpjjpsed 
or used Tor obtaining estimates of health characteristics for local 
areas. Particular emphasis is given to discussion and evaluation of 
synthetic estimation procedures . developed "originally at ; the 
National Center for Health Statistics for purposes of estimating 
levels of health characteristics bbtained from the Health Interview 
Survey for each State and the District of Columbia. 
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INTRODUCTION 

Statisticians, demographers, economists, and 
others have long beA aware of the critical need • 
fof accurate small area statistics-. While the U.S.* 
decennial- census provides accurate local statis : . 
tics of many characteristics once every 10 years, 
the accuracy of^these statistics becomes ques- 
tionable as time elapses from thfe last census and, 
in addition, characteristics other than those 
found -on the census questionnaire are often 
'desired. 

Although ' a rather extensive system of 
ongoing general purpose surveys is conducted by 
Federal agencies, they are • almost always 
designed to produce estimates for the United 
States as a whole or, at most, for rather large 
geographic regions or divisions. For reasons of 
sample size and design, direct estimates, for such * 
subdivisions as cities* counties,* States, or other 
minor civil divisions, which ^ are so crjticcilly 
needed, can rarely be obtained from these sur- 
veys. 

The National Center for Health. Statistics*, 
one hi the Federal agencies responsible for main* 
taming a sfstem of sample surveys and otherjJT 
data collection systems, has long recognized th#* 
need for good small sqra s#tistics,.iihd for the 
past^ decade hat investigated alternate strategies 
for obtaining such estimates. * In, particular, 
NCHS has developed a procedure known as 
"synthetic estimation" tor obtaining small area 



JUin^at the Medical Center, Chicago, and 
]ma/ Cehter for Health Statistics 



; This procedure obtains jmall area esti- 
\ characteristics by combining national 
; of the characteristics specific to demo; 
p^roups with estimates of the propor- 
L nbutien of the local population into, 
Bups. The Subgroups Would be chosen 
|Velevance to the characteristic being 
1 For example* if it were desired to 
|e prevalence of the sickle cell trait in 
r county having a racial distribution 
ent white and 70 percent i>lack, and 
|eticai national survey estimated that 
; prevalent among 10 percent of UiS. r 
virtually nonexistent among U.S. 
|would estimate that 7 percent of the 
the^ county had the trait (30% X 
10%). This is a synthetic estimate. 
ntage$ of the synthetic-estimation 
Aocdf estimation are its intuitive 
fiplicity, and its low cost relative to 
y of the Jocal population. A major 
|s its lack of sensitivity to certain 
pristics. For example, in the above 
I may happen that the white popu- 
area are all of Mediterranean; des- 
r more than a negligible amount of 
he sickle cell trait, 
earch on the synthetic estimation 
1 emerged since an NCHS report on 
t nates of disability for States was 
if 1968. 1 The purpose of this report 
tip critically the various methods for 
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obtaining local estimates that are in the litera- 
ture and, in particular, to examirie synthetic 
estimation from a methodological point of view. 

REVIEW OF THE LITERATURE 

The need tor methods of obtaining valid and 
reliable estimates of characteristics of local 
populations has been recognized for a long time 
by statisticians and demographers. In particular, 
much effort has been expended by statisticians 
' associated with the U.S. Bureau of the Census 
and their contractors, especially in the use of 
'symptomatic variables such -as births, deaths, 
and school enrollment, which are available on a 
local level, to measure changes in population size 
since the most recent decennial census. Methods 
such as the vital, rates technique, censal ratio 
method, Census Bureau. Component Methods I 
an4 Uy ratiq correlation method, #nd others have 
been described extensively in the literature. 2 
Basically, these -methods use the relationship 
between the population size of the local area at 
the most recent census and the measure of the 
symptomatic variable or variables for that year, 
in conjunction with the value of the sympto- 
matic variable(s) at the date for which the esti- 
mate is desired, to produce the desired local 
estimate of population 'size or change. An 
elaboration of the use of techniques based on 
symptomatic variables has been developed 
recently by Ericksen. 3 ' 5 His elaboration involves 
use of sample data from the Current Population 
.Survey in .conjunction with symptomatic vari- 
ables to obtain estimates of populatiop size for 
^^local areas. • . 

Although -health statisticians have long felt 
the need for valid and reliable estimates of 
health characteristics for local are?is, only in* the 
past decade has serious attention been given to 
the aevelopment of methodology for obtaining 
Jocal area estimates, of such health characteris- 
tics as morbidity, ^mortality, disability, and 
, utilization of health care services. The methods 
• developed by demographers for estimating local 
population sizes could not, however, be directly 
applied to the estimation of health cha^cteris* 
ties for local* areas; hence,, methodology for 
estimating health conditions for local areas has 
developed along different lines • from those 



discussed above for local estimation of<popula- 
tion size. ■ ' . 

A major advance in estimation of health 
characteristics for local areas came with an 
experiment* conducted by Walt ,R. Simmons 
and his staff ft the National Center for Health 
Statistics (NCHS) during the mid-1 960*s and 
published in 1968. 1 In this-experiment, three 
different estimation techniques were used to 
produce estimates of long- and short-term dis- 
ability for eath State in the United States for 
the 2-year period beginning July 1, 1962, and 
endijig June 30, 1964. The NCHS data used for 
estimating disability were from the Health Inter- 
view Survey (HI5>), and th.e population data were 
from the Current Population Survey Qpdate of 
the 1960 Decennial Census. 

One of the methods *used was proposed 
originally by Woodruff to produce local esti- 
mates of retail trade; 6 the other two, namely, 
the synthetic estimator and the nearly unbiased 
estimator, were developed at NCHS. These 
methods will £e discussed in greater detail. Ol 
the three methpds investigated, the "synthetic 
estimator was judged to *be the most promising 
for estimating disability on the State Ifvel, and 
«rthe estimates finally published 1 were obtained 
'by this method. 

The NCHS ^ publication synthetic 
estimates of disability 1 seehied to stimulate 
further efforts to apply and evaluate synthetic 
estirtiatiori. Within NCHS, an evaluation of the 
, - synthetic estimation procedure was conducted 
# in whiqti synthetic estimates of death rates in 
1960 from four causes (motor vehicle accidents, 
major cardiovascular-renal diseases, suicide, and 
tuberculosis)* were calculated for each Stat? and 
fpr the District of' Columbia. 7 These synthetic 
estimates of death rates were then compared to 
the known true^ death raftes.for each State and, 
in generaQ, agreement between syrthetic ♦esti- 
mates and true death rates was good* for one of 
the causes examined (major cardiovascular-renal 
diseases), fair for another (suicides), and poor 
for the other twQ (motor vehicle accidents and 
tuberculosis). The general conclusion from the 
study was that the validity an8 reliability of 
synthetic estimates might differ from character- 
istic to characteristic. 

As part of the NCHS study of death rates, an 
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alternative estimation procedure was developed. 
The resulting estjmSUor, called the regression- * 
adjusted estimator, uses the synthetic estimate 
in' combination iwith ancillary data available on 
the State level and thought to be correlated with 
the health characteristic ta bte estimate^. 7 'This' 
estimate, for at ltast one of the causes of *d earth 
examined, seemed to be an improvement over; 
the synthetic estimator- * ' 

After the NCHS publication on synthetic - 
estimates df disability, the Bureau of the Census " 
produced synthetic estimates of unemployment 
rates and number of dilapidated .housing units 
that had all plumbing facilities for States, 
SMSA's, and counties. 8 ' 10 In addition, ektensive , 
studies were urtdertaken to evaluate the syn- 
"the\ic estimates. An important result of these 
studies was the emergence: of a criterion, called 
the average* mean square error (AMSE), as a 
proposed measure .of the accuracy of a set of 
synthetic estimates, and the development of a > 
method for estimating the AMSE.^' 11 These 
methods will be discussed in greater detail later 
in this report. 

Most \ recently,- Namekata, Levy; arid 
O'Rourke 12 investigated the use of synthetic 
estimation in obtaining estimates of complete 
arfcl partial work disability for Stated based on 
'data from the 1970 census. The synthetic esti- 
% mates were obtained' and compared *with the 
direct estimates that were available from the 
J970 Decennial Census - for each State. Their 
general conclusions were that, -the synthetic 
estimation technique was fairly good for partial - 
work disability but fairly poor for completed 
work disability. 

ALTERNATIVE METHODS OF 
OBTAINING ESTIMATES 

Background ^ 

In the original NCHS investigation of-, 
alternative procedures for small area estimation 
of health characteristics from 'the Health Inter- 
view Survey (HIS), several procedures were con- 
sidered. 1 In this section, w^ will discuss in detail 
.two of the methods, namely, fhc nearly 
unbiased estimator and the v synthetic estimator. 



In addition, we will discuss an estimator, colled 
jthe regression-adjusted estimator, not considereel 
in the original investigation but developed in a 
later study. 7 « ^ 

One of the problems ih* obtaining estimates 

, for States of health characteristics based on HIS 
data is the fact that the basic design of the HIS 
does not lend itself to unbiased estimates for 
Sfiites. In the basic HIS design^ a primary sam-„ 
pHng unit (PSU), which is generally" a county or 
SMSA, is chosen to represent a stratum consist- 
ing of ope or more demographically similar 
PSU's. Those strata consisting of more than one 
PSU are called non-self-representing strata, and 
thejf component PSlf's ,may not be frtfni the 
same State, although they would be v from the 
same census region (Northeast, North Central, 
Sputh, or West). Thus, the estimate from a sam- 
ple PSU "when inflated to represent the entire 
stratum might cut across State boundaries, and 

~4*nce, it would be impossible -to combine the 
unbiased estimates for strata into unbiased esti- 
mates for States. » - # 

Nearly Unbiased Estimator 

One pf the mejthdds considered in the origi- 
nal NCHS investigation is called tht nearly un- 
biased estimator and yields an estimate for a 
State that is technically nearly unbiased. Basi- 
cally, this procedure takes the usual HIS stratum 
estimate for van aggregate and ^locates it to a. 
State in relation to the proportion of the total 
stratum population coming from the State. In 
other words, the nearly unbiased estimate %' s of 
'the mean leychoj characteristic X for State s s is 
given by ' > * , - 

/•i 

where , 

Xj = unbiased HIS estimate for the mean-ievel 
of X for stratum;, 

«*.. 7 la n sj. 

~jhe number of persons in State s, 



= the number of persons in that portion of 
stratum ; which is in State s 9 

n sji = the number of persons in the ;th stratum, 
sth State, itK.PSU; 5=1,..., S; /; # 
«- I,*.., I sf9 and 

I sj r the number of PSU's in stratum > that are 
in State s. 

Some properties Qf the nearly unbiased estin|ate 
X' s are given helow. The proofs are presented in 
appendix I. * . ~ s - # * 

Lemma l :*The expectation E(X[) of the nearly 
unbiased estimator X' s * is given by 

(2) 



* ;=1 ?«•• 



where 



J* 1 * 



• " t average level of X in stratum ;' , 
S * 

j* 1 

= number of persons in stratum;, 
**) n sii X sii i * 



^• | . - average level of characteristic in that - 
portion of State $ whic^i&^iiV^SU i of 
stratum;* * 



Lc mma 2 : The bias M(X' s )Jn the nearly un- 
biased estimate JfJ cah be expressed 
by . . 



- average level of X in that portion of 
• stratum;^ which is in State and / 



,;=] 



(3). 



or eqpivalently by , 

' Lemma 3: Let us assume'that tha ratio n^/n s . *> 
is the s'ame for all PSU's in thefcame 
< State, which implies that 

Then the bias B(X' S ) in the nearly unbiased esti- 
matcX, has the form given by 



theorem 4: If = -L f or ,=1 / th 

^ **(*,'), the square oL the bias in 
given by the expression. 



en 
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TeMeA. Interpre tat ion of the components of the square of thelSaVof the "nearly" unbiased estimate 
» Component 



Interpretation 



3 - Jtt-E^-vE 



Rtpments difference b e twe en tht aVeraga level of X for a stratum 
and tht evarlee level pf X for the portion of the straum that la in 
State*. 



Rapraaantt varianoa in X amtfna PSU's belonging to the seme State 
apd stratum. * 



Repretents a "between-strata" coverience, . 



Represents e ''Jjettoeen-PSU" coverience. 



where 



M sj i- 1 



Theorem 4 implies that under the condition 
that n sji /n(j is the same for all PSU's within the^ 
saipe stratum and belonging to the same S^ite,' 
• the square of the bias ^Jtte nearly unbiased 
estimate ^-consists of nHporbponents $pecj* K • 
fied in table A., \. 

It can be shown that the third component, 
of B*(R' S ) can be transformed to the equivalent * 
algebraic form riven by \ , 4 



Variance and tttean squaw errotofthe near- 
Jy unbiased estimate.«-The variance 0$, can be 

t obtained directly from it* definitional formula. 
This is given by ^ 

where;>ifl, is the variance of the imbibed 

estimate of the mean level of Xin sttatum j. - 
* It foll^A that the mean sai^re error MSEj* 
tff the nearly unbiased^ estimate-^ is givfen by 

>v&ere 0j»/kjtiven by relation (9), and 5*0^') 

is ghretw b y 3 ^lation^(8) (under the condition 
\ that n sfi fn sju 

. j * , Evaluation of the [nearly unbiasejl esti- 
% Thus, an equivalent expression for-the square of - m<tfor.— In the original WCHS^ investigation of 
the bias in the nearly unbiased estimate X[- is V methods for, obtaining* locaK^stimates, the nearly 
given by , ^ \. : 

I •' • '- ' 



algebraic 

y 
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- unbiased estimate did not emerge as the method 
of choice for producing these local area HIS esti- 
mates of health characteristics primarily because 
examination of the estimates produced by this 
method showed evidence that they were t&f- r 
stable. 1 . 1 

. A later study was performed at NCHS to 
determine* the -extent to Hvhich the nearly un* 
biased Estimator might be biased.^Tfie data bas^e 

^ chosen for this study was mortality data in I960 
fo* ;the 42 States m the^&Lorth Cenjtral, SoTlth, 
and jVest, Regions of the United States. In par- 
ticular, jteany unbiased estimate's of total 
deaths*'dea{bs from motor vehicle accidents, and' 
deaths from major cardiovascular-rei\al diseases 
■were obtained .from each State using the same 
stratification that is used in the Health Interview 
Survey. T$ese nearly unbiased estimates were 

9 then compared with the true number of deaths 

. in each State in the three regions examined, and 
, the biases of the estimators were evaluated by 
the percentage absolute difference l 9 QfX . s -X s J 



TaWe B. Distribution of percentage absolute differencei between 
the 'nearly unbiased estimate and the true value among 42 
States In the North Central, South, end West Regions for total 
deaths, deaths from major cardiovascular- renal diseeses, and 
/deaths from motor vehicle accidents, 1960 



Percentage absolute difference 
between nearly unbiased 
estimate and true veiue 




Total 

(MM)* 

1.0-1.9 

2.0-2.9 

3,04.9 , 

4.04* * 

8.0-6.9 „. 

&. =* 

8.0-8* 

9.0-9* - 

"****~34kdUin parcel dif- 
fewncr-K« 



Cause of deeth 



42 



Frequency 



42 



16 
6 
6 
5 
1 
1 

» 3 
0 
2 
2 



1.78 J 



15 
8 
5 
3 
2 
2 
2 
1 
0 
4 

Percen* 



1.70 



42 A 



12 
- 6 
*4 
1 
3 
3 
1 • 
2 
5 
5 



2.70 



The distribution of percentage absolute differ- . 
ence is given in table B/The median percentage ^ 
♦absolute difference was l rf 78 percent for total 
deaths, 1\70 percent for .major cardiovasculax- 
relnal deaths and 2.70 percent for motor vehicle 
accident deaths. The -small biasel obtained frorn 
this empirical study would yield the interpreta- 
tion that for the stratification used in HIS, the 
nearly unbiased estimator is in fact an estimator 
having small bias. 

However, in a given year,- the number of 
households interviewed in a- particular stratum 
might bfc quite ^small for the Health Interview 
Survey. It 4*<tlierefore,l anticipated that o%> and 

* , hence o±_, the sampling variance of the nearly 

unbiased estimator might be quite large. Ih addi- 
tion, -the o|« are difficult to estimate from the 

data.' Hence, in terms of- sampling variance, the 
nearly unbiased estimate X[ might hot be the 
.method of choice. 1 

Synthetic Estimator 

ajf i( . % 

Background.— The other method for obtain- 
ing local estimates of "health characteristics 
investigated in the original NCHS study 1 is 
known fes synthetic estimation and was the 
method finally chosen, for producing local esti- 
mates of HIS health characteristics for States in 
1963-64 and .again in 1969-7L 13 The underly- 
ing rationale for synthetic estimation is that the 
distributi6n of a health characteristic does not 
vary among populations' of States except to the 
extent that States vary in demographic composi- 
tion. In other words, the method assumes that 
the incidence or prevalence of a, health charac- 
teristic would be;the same for two States if their 
composition were the same with respect to such 
demographic variables as age, sex, race, family 
income, family' size, place of residence, and in- 
dustry o£ the head of the family. 

Conceptually, synthetic estimation uses the 
model given by ^ 



(10) 



where 



ft * 



X - mean level of characteristic X for the 5th 
State, 



P sa * proportion bf the population 9 who art 
members of population cell ot (alpha), 
which is the socioeconomic demography 
Gaily bounded class of specified age, sex, 
race, income, etc. The sum ov^r all a cells 
* ofP sa = unity. ^ 

= mean level of characteristic X for persons 
in cell ot in the United States as a whole, 
and 

k = number of a celjs utilized. 

In the original NCHS investigation, the X a 
were national estimates of HIS variables for the 
period *July 1962 June 1964. The^ population 
estimates P' were obtained from tabulations of 
a 5-percent sample questionnaire of % the 1960 
Decennial Census of Population for the 50 
States and the District of Columbia. The popula- 
tion a cells were defined by cross-classifications 
of tlfe following variables: 

1. Color: white; all other 

2. Sex: male; female * 

3. Age group: under, 17 years; 17-44 years; 
45-64 years; 65 years and over 

4. a Residence: standard metropolitan statis- 

tical varea (SMSA)-central city; SMSA- 
not central city ; not SMS A 

. p 5. Family incomes uAder $4,000; $4,000 

and over * * 

ft * * * 

6. Family size: fewer than seven members; 
seven members or more , n 

71 . Industry of head of family: Standard In- 
dustrials Classification cocjes 1 through 
17. (Forestry and Fisheries, Agriculture,' 
and Construction) and CQdes 19 tod over 
(All Other Industries) 

The, 384 possible cross-classification cells* were 
collapsed to 78 so that reliable estimates could 
be obtained from the Health Interview 'Survey 
for eaCh a cell. ( 

*,For the synthetic estimates for -the years 
1969-71^ HIS data from the three surveys ot 
1969, 1970, and 1971 were used to obtain the , 
rates or 'percentages of the health characteristics, 
measured. The populations of the 50 States and 
the district of 'Columbia were obtained from a 



sample described, in a publication t>f w the U.S. 
Bureau of the Census entitled Public Use Sam- 
ples of Basic Records From.the 1970 Decennial 
Census, .Description and Technical Documenta- 
tion, published in 1972. Of six sudTsampleS, 
the one" used was^he State Public Usfe Sample 
from the ,5-p£rcent questionnaires, ^trsons'lir 
the military, or.confined to institutions were not 
included in the population estimates produced 
for each State. Thus the restriction of the HIS 
samples to the civilian noninstitutionalized pop- 
ulation was carried over to the$e synthetic esti- 
mates. Of the seven variables used to* produce 
the 78 a cells for the^ original reporv only six 
Were available in the Public Use Sample used to 
produce* these synthetic estimates. The variable 
that was not available was residence in standard 
metropolitan statistical areas. The six variables 
can produce a possible 128 cells of data. These 
were collapsed to 50 ot cells for which reliable 
national estimates a from the Health Interview 
Survey could be provided/A regional adjustment 
(as specified below) was employed for the Statfc 
estimates within *each of the four geographic re- 
gions of the United State#to inake these esti- 
mates consistent with 15 the regional estimates 
produced by the probability design .of the 
Health Interview Survey. 

In summary,, the synthetic estimates pro- 
duced for; HIS health characteristics -for the 
ygars 1969-71 13 use the same basic method "as 
was- used in the original NGHS investigation. 
However, ii* addition to estimates of long- and 
short-term disability, estimates of utilization 
of medical services were provided as well as esti- 
mates for subdomains of the population of each 
State (age, sex, color, and family income^. Also, 
a methodology has # been developed for providing 
sampling.variarices bf syntheticestimates. 

Detailed synthetic estimate.— Thc\ detailed 
synthetic estimate X^ of -the mean level of 
characteristic X for State s is given by ' 

* of = X' r 

X s ^X sl ^~ - (11) 

iv, . y 

where 

X s = final synthetic estimate of the mean level 
of characteristic X for State s, 



X' r m the usual HIS final estimate of the mean 
level of characteristic X for region r 
(where region r contains State $), 

t* T \ 38 proportion of the population (from the 
1970 Decennial Census) of region r which 
~~ is in State t (**!,..., 7% 

T 38 number of States in region r, 

* first-stage synthetic estimate of character- 
istic X for State s, 



X' a = final HIS estimate* of the mean level of 
characteristic X for demographic cell a for 
the United States; 

P' sa - estimated proportion of the 1970 popula- 
tion in Stkte s belonging to cell a (as esti- 
mated from the 1970 U.S. Census 1- 
Pctrcent Public Use Tapes), and 



k - number of a cells. 



Synthetic estimates Jt su for subdomains 
(age. sex, color, and income) are given by the i 
estimator * 



(12) 



where 



X su = the final synthetic estimate for the mean 
level of characteristic X for subdomain xu 
within State j, . r ' 

"■preliminary synthetic estimate for the 
mean Ieyfcl* of characteristic X in sub- ' 
< domain u of State s r * 

P', a ■ the estimated proportion (as estimated for 
the U.S. Census lS>70 1-Percent Public 
Use Sample Tapes) of the population , of 



State s belonging to cell a, and 

- the estimated proportion of the popula- 
tion of State s belonging in subdomain a, 
except for^synthetic estimates of work- 
loss days per person per year in age 
groups. By definfion, HIS excludes all 
persons under 17 years of age from the 
employed population. Therefore, the fac- 
tor f su jn the denominator of the ratio ad-* 
justment for these statistics is redefined as 
*the estimated proportion of the pppula- 
tion age 17 and over in State 5 that be- * 
longs to subdomain. (age group) u. 

The synthetic estimates for subdomains as 
giveh by equation (12) are ratio adjusted so that 
the aggregates are consistent with {He final syn- 
Hietic-estimates for the State as a j^hole. 

The a variables were limited to those listed 
4>elow: ' . ' 

» Color: white; other-th an whSfcTt 
Sex: male; female. 

*Age group: under 17 years; 17-44 years; 45- 
64 years; 65 years and oveh 

Family income: under $5,000; $5,000 and 
over. 

Family size: fewer than sev€n members; 
seven members or more. 

•Industry of head of family: Standard Indus- 
trial Classification Codes 1 through 17 (agri- 
culture* forestry, fisheries, mining, and con- 
struction); 18 and above (all others). 

The 128 cross-classification cells produced by 
these variables were collapsed into 50 a cells 
for' which refiablejnational estimates from the* 
Health Interview Surety could be made. 

^fhc ratio adjustment X;/^ P ft X t was in- 

eluded in order to reflect a regional component 
in final estimates. It is the ratio of the published 
regional figure to the preliminary, derived re- 
gional rate calculated from the State estimates. 

Estimation 1 of sampling errors of synthetic 
estimates for HIS characteristics 1969-7L-The 
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synthetic estimated presented for .the L969-71 
HIS data are subject to sampling variability from 
two sources because they are based 6n HIS esti* 
mates and estimated population jproportions. 
(When synthetic estimates ai;e computed from 
known proportions and population means they 
are not subject to sampling error, since there 
would Jbe no sampling involved in the synthetic 
estimation procedure^ The sampling variance of 

^ synthetic estimate X s (ignoring the regional ad- 
justment) is given by 

' Si * 

°% =Var(i;-«)=i:,Var(«) V 

+ «t ,cbv(«,i»; a i;.) (is) 

a< « 



If Pf ^hlinimum J0/^ JP sk ) ,<m&4?*XRel 
variance of AT,, we hav£' • 



E/* o%, + 




p sa >do V (xi,T a .) 



^E X2P,Ai-P ta ) 
[*($)] 2 



But 



AT a , then the variance of given 
by equation (13) reduces to 



Theorem 5 : If the P' sa are independerit of the £ Jf 2 P (1 -7> ) = />2 



a2 ^V/>2 a 2 +^Ly; x2 n_p \p- 

% set v . n ^ a V 1 r sa) r soL 



+ 2 E v,«-£pv (i4) 



and 



a < a 



for large values of n s . 

The first and third tentis of equation (14). 
represent the variance of X s if the P' ta were not 
subject to sampling variation. Thus, the effect therefore, 
of sampjing variation of the P' s ^ on cr* ^ is meas- 

^ured by the expression • , v * L 



o-l 



n 



E *S^a (1 



(15) 



. E *2 (i - Pj<r-psr- [E(x,)] 2 , 



and the ret-variance of the synthetic esti- 
mate ^ satisfies the inequality given by 



' • E + 2 E P sa P sa - Cov (*.,*..) « 



pm n 



(16) 
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Since the first te**m in the right-hjuid side of 

* * * % • . * © Ctf 

the relation (16) is the relative variance of X g 
under conditions that- the P^ are not subject to 
sampling error, the effect of samplip^ error in 

the on the relative variance of X s would 
therefore be less than (i; r #")/(PJ**;), * c . 
second term in the righMiand side. of relation* 
(16). This is summarized in tabl^C. 

As ii seen jn table C, foraU but one 6r M0b 
of the -smallest States, ;the effect of sampling, 
variance in the .on the relative finance of 
the synthetic estimator .X s . *would be quite 
small. • ^ 

The approximate variance of X $u , the syn- 
thetic estimator for subdomains, ^can be ex- 
pressed in a form parallel to expression (14) as 



A SU Ci€U 



E^7 

Js 



S aeu 



su 



+ 2 



a<a' J su 
a\aGu 



( 



-f-'co«K,x;.) (n) 

J SU 



Sampling variances of synthetic estimates of „ 
HIS health characteristics for the 3>-year period* 
1969-71 were obtained based on equations (14) 
and (17) with the following two modifications 
ttiade to -simplify the calculations: 

1> Pja = ? a for all a, where P a represents 
the proportion of the U.S. population 
in~cdl a. That is,*he proportion of the 
population in any a cell is approximately 
thesame for all States. 



2. Cov (^ > ^) = Ofor aUa<a',sothat 
, the thifd term of equations (14) and 
*(17) drops out. 

Under these assumptions equation (14) reduces 

to : 



and equation (17) becomes 

at ±Y,(t) 2 ° 2 . 

. ^L^7(i-t)^9) 

* aeu Ju \ Ju/ 

where 



Equations (18) and (19) were <he expres- 
sions' used to, compute sampling errors for the 
estimates in the report. 1 3 Almost all. the esti- 
mates had sampling errors that were very small 
relative to the size of the estimates themselves. 
The , relative standard error (RSE), defined by^ 

< ^ TT~~ 

. < RSE(X S )J^ 

>■ 

and * 



Table C. Maximum contribution to the relative variance of S g of 
• „ sarnplfna. variation in the P 





» 




p? 




1 * 


n, 


.0001 


.001 


.01 


.06 


.10 




Maximum contribution to the 
relative variance of X $ 


1,000, — ::„.._„ 

io>ooo „. 

100.000...... 


10.00 

1.00 
0.10 


.10 
.01 


.10 
.01 
.001 


.019 

.0019 

.00019 


\ .009 
.0009 
.00009 



was 5 percent or less for virtually all statistics in 
the report, even for the smallest States. Tile only 
important exceptions occurred for estimates of 
the proportion of persons in certain population 
t subgroups who \yere unable to carry major 
activity. The most variable subgroup was the 
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17- 



\ undcr-45 age group, where the RSE ranged from 
7.4 percent for the entire United States to 10.4 
percent for Alaska. The highest single RSE wad 
11.6 percent for white persons in Alaska. Al- 
though^ may seem strange for State estimates' 
to have such small sampling errors, these .esti- 
mates were essentially weighted averages of 

f national HIS estimates based on 3 years of data 
collection. 

* Bias of synthetic estimator.— The synthetic * 
estimator is a biased estimator with the bias 
B(X S ) given by 



(20) 



where X sa is the true mean level of characteristic 
X for demographic cell oc in S?ate s. 

Regression-Adjusted EstimatW— 

Background.— One of the J>asic V Iimkatioris 
on the synthetic cstim.ator fc X s is that it is 
adjusted only for the specific set of demographic 
cells (or oc cellsy taken into consideration. If the 
parameter being estimated is influenced by vari- 
ables other than those taken into consideration 
by the oc cells, then the synthetic estimator will 
not reflect this influence. Often it is not possible 
to include in the a-cell an$y all the variables 
thought to be important because d^ta on these 
variables" are not available 4 in sufficient demo- 
graphic detail. However, although & particular 



variable might notHbe able s Jto be used in the 
synthetic estimator, it can often be taken into' 
consideration in other Ways v In an earlier arti- 
cle, 7 a melhojd was proposed ;t& take' into ^bn- 
sideration such variables.- V, ' 

Method of esttmation.^Uhe* method ^re- 
^sented below uses tht synthetic estimator: X $ in 
opnjunction . with a set . of ^ciliary variables 
z sl t ... t z sm to produce. an adjusted synthetic 
estimator. In particular, ' we assume e th£, linear 
model given by * 



(21) 



where Y s , the percentage difference between the 

synthetic estimafe X s and . the true value X s of 
characteristic X for State 5 is giyen by 



100, 



*~ s ~ term representing random error, 

z ci 2. m = values of variables z, • a„ for 
State 5, and 

oc, j3j (3 m = regression coefficients to be*esti- 

mated s 

If estimates & of oc; 0j.of 0j , arid 0 m of P m 
were available and substituted into the right- 
hand side of equation (21), algebraic manipula-. 

tion would resulj in an estimator Jf, of X given 

by 



X^X { [1+0.01 (a 



m z sm )1 



(22) 



Equation (21) spates .that the percentage 
difference Y s between the synthetic estimate X s 
and the true value X s is a linear function of a set 
of variableT^j , . For example, z sl might 

be the,proportion of persons instate 5 livingjri 
SMSA's;2 j2 » ine proportion of persons in State 
s having family income^befow the poverty level, 
and so forth. Equation (21)* expresses the con- 
cept that except for random variation the per- 

.' " ) . 



cegtage difference between a synthetic estimate 
and a true value is a linear function of a set of 
variables z sl , . . . , . Tjie estimator given by 
equation '(22) is called the regression -adjusted 
estimator, and it was used and* evaluated by 
Levy 7 in computing State estimates of deaths 
from motor vehicles for the year 1960. In that 
study, it was found to be an improvement over 
the synthetic estimator. However, it can be used 
dnly when relevant ancillary data are available. 
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EVALUATION OF SYtftHETIC 
-> ESTIMATES ^ , 

Background v 

fundamental problem of the synthetic 
estimation procedure has been the difficulty in 
evaluating ■ the estimates* produced by this 
methodology. Although expressions ha\^ been 
derived for estimating' the sampling variance of ■ 
synthetic estimates, it is much more ^difficult to 
estimate the bias of a synthetic estimate, -and 
since sampling errors are often small for syn- 
thetic estimates, the bias may often make the 
largest contribution to the total mean square 
error. A method has been developed, however, 
by investigators at the U.S. Bureau of ,-the' 
Census 9 ' 1 4 for estimating the mean square error . 
of synthetic estimates by somewhat indirect 
means. ' ' 

Another consideration of importance in ob- 
taining synthetic estimates is their sensitivity to 
the particular set of a cells- used\rj producing 
them. Although a more » detailed a-cell grid 
should produce synthetic estimates having lower 
bias, the potential reduction in bias ma^ in fact 
be small and may not justify the cost of obtain- 
ing the detailed a-cell grid. This issue has been 
addressed in an empirical study using the syn- 
thetic estimates of disability, utilization of 
health services, and limitation of activity based 
on 1969-71 data from the Health Interview Sur- 
vey and is discussed in a later section. 

Estimation of Mean Square Ertbr (MSE) 
anjl Average Mean Square Error (AM§E) 
of Synthetic Estimates 

•A procedure has been developed by W^JiJ- 
berg and Gonzalez 9 which enables the mein 
square errpr of a synthetic estimate to be esti- 
mated provided that an unbiased estimate of 
the same J characteristic exists for the same 
population which is uncorrected foith the syn- 
thetic estimate. This procedure is developed by 
means of theorems^prescI^ed^below:' 

Theorem 6 : Let S s estimate a oarameter X s 
with bias given' by $(% g ) and let XI 
be an unbiased estimate of X r 
/ .which is uncorrelated with X r . 
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Then the following rejation is true: • 

s 

E{T t -l s f = MSE* +o*_ (23) 

5? . 

where 



and 



a 2 = # the variance of XI. 



Theorem 7 : If X s is an estimate of X s with bias 

given by B$ s ),if is sin unbiased 

estimate of X s uncorrelated with S s 
an<i iLo*. is an unbiased estimate 

y* A 

of Q^fr then the estimate MSE* 



given by 

,MSE| = (X' s - %f - al 



(24) 



is an unbiased estimate of MSE^ . 

. Investigators at the Census Bureau have used 
the relationship given by equation (24) to evalu- 
ate synthetic estimates for certain variable? such 
as unemployment where independent estimates 
are available. Howler, a serious limitation on . 

the >use of the estimated mean square error 
A n I 

. MSE* as given in equation (24) is its likely in- 
stability since .the jmbiasepl estimate X' s and the 
estimate a£ of its variance are both likely to 

have large variances themselves, since, in all like* 
lihood, they would be based on relatively small ' 
1 sample size. Aware <5f this, Gonzalez and Waks- 
berg have introduced" the concept of evaluating 
synthetic estimates' not by their individual mean * 
square error? out by the ^ average meah square 
error (AMSE).of a set of -synthetic estimates. 
Specifically, the AMSE'o&a set of*S synthetic 
estimates is given by h \ " 
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uhHs estimated without, bias by the expression, 
' *MSE, given by- ,» . * '" » 

• • ' f ; '. - , s; ; ; 

% '1*&E-$Z <$,. (26) 

Thte statistic has been used with cei^ain elabora- 
tions - by * Ortsus Bureau, investigators as th£ 
;major criterion for evaluating synthetic esti- 
mates/ ^shortcoming of\his criterion, however, 
is that it ctofcs not yield ^estimate of the mean 
squafe i error Cor a specific synthetic estimate 
(e.g.,' estimated unemployment in Ohkv 1976)-. 
' Rather,* it gives the average mean square error a * 
set of 'synthetic estimates. 

Evaluation of .HIS Synthetic Estimates for 
Alternative a-Qell Grids 

Investigators at NCHS originally hoped t£^ 
-evaluate the 1969-71 HIS synthetic estimates by*- 
means of the AMSE criterion.' However, there 
was no unbiased estimate uncorrelated with the 
synthetic estimate that could be used in equa- 
tion (26 J. Although it is* thought that the bias of 1 
the nearly unbiased estimate discussed above is 
likely to be small for UfS Variables, and that the 
Correlation between the 1 synthetic estimate and 
nearly unbiased estimate is also likely to be 
small, the task of obtaining a reasonable esti- / 
mate, of its variance is difficult since it is often 
based on data from one or two primary, sampling 
units. . ' ' 

The main thrust in the, evaluation of the 
1969-71 HIS synthetic estimates was an empiri- 
cal investigation comparing synthetic estimates 
based on the 50 a-cell grid used to obtain the 
published 1969-71 synthetic estimates with 
those obtained by collapsing the 50 cells into a . 
smaller grid. In particular, synthetic estimates 
were obtained for the 50 States#nd the District . 
of Columbia based on (1) the total 50 a-cell 
x grid, (2) a 2 a-cell grid based only 'on sex, (3) a 4 
a-cell grid based on age alone, (4) an 8 a-cell grid 
based on* sex and age, (5) a 16 a-cell grid based 
on color, sex, and age, and (6) a 16 a-cell grid, 
.based oxt" family income, sex, and age. The 
s^ifthtfic estimates produced by each of the^ 
collapsed grids (sex, age, sex by age, sex by age 

* and color, and sex by age and income) were 
compared with thq synthetic estimates produced* 



absolute difference 



between 



by the total 50 a-cell grid by use of the follow- 
ing summary statistics: 

1> The mean .over all 50 States and the 
District of Columbia of the proportional 

^ ith£ synthetic * estimate *% Stg based on a 
particular grid and the synthetic estimate 

X s based "on the total 50 a-cell grid 
(table D). - 

♦ 2. The maximum <jvfer all 50 States*and the 
District of •Columbia of tffe proportional 
absolute difference defined above (table , 

E>. • . 

3. The correlation coefficient dve* all 50 
States and the District of Columbia be- 
tween"* the synthetic estimate produced 
by a collapsed grid with* that 4 produced 
by the total grid (tabled). , * • 

The mean proportional absolute difference- 
(table D)', is a measure of the average relative 
jjiffereiice between synthetic' estimates produced 
by a coUapsed grid and those produced by ^he 
total grid* For the HIS variables considered in 
this study, synthetic estimates produced by each 
of the coUapsed griefs agreed closely by this cri- • 
terion with synthetic estimates produced by the 
detailed 30 a-cell grid. For most of the 14 vari- 
bles in this study, the mean proportional abso- 
lute difference was less s than 5 perceht, and the 
worst agreement by this criterion was shown by 
the synthetic estimates produced by the a-cell 
grid based oA sex. In mqst cases, the synthetic 
estimates based on age by sex/age by sex and 
color, and age by sex and income did not show 
substantially better agreement by this criterion 
with those based on the total detailed grid than 
the synthetic estimates based on age alone. ' \ 
' ,The maximum proportional absolute differ- 1 
ence (table E) gives a feeling of the extent that 
a single synthetic estimate based on a collapse^, 
grid might differ from the compa&ble synthetic 
estimate based on the detailed 50-cell grid. The 
magnitudes of Some of the statistics shown in 
table £ imply that in individual States, the grid 
used to compute the synthetic estimates might 
affect the size of the estimate, even though the ' 
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TeWe 0. Mean proportion^ absolute differences between tht synthetic estimete produced by the total 50 erceJI grid end that produced 
by pthtrtt<dlgr&fyseJtc^ • 



7 



HIS variable 



c^y! grid 



Sex 







Sex 




Sex 


by 


Age 


by 


color. 






*• 


ft 


ft 





Sex * 
by 
in- 
come 
by. 
<9* . 



Mean proportional absolute 
. ' -difference 



Restricted activity days .'. M * *„ u 

8ed disability days „ ^ * , v 

♦ Work loss days ; £„ \ ; r „. 

Hospital discharges per 100 person years .<f. 

Average leVigth of hospitalization v 

Percent of persons having one or more hospital epifodef in e year 

* Percent of persons having one or more physician visits m a freer .A..; », 

Number' of physician visits per person ye<r 

Percent of persons having one or more dentai visit*' irfa'year 

Number of dental visits per person yeer ...'..^ v 

Percent not limited in activity .A^. r ..a ; \ 

Percent I imfted in activity C. v \ 

Percent limited in amount or kind of major activity v 

Percent unable to carry on major activity u * 



.028 
.029 
.039 
.021 
.035 
.01 3 
.008 



.031 
.056 
.008 
.061 
.064 
.094 



.017 
.026 
.026 
.008 
.034 
.008 
008 



021^.016 



.031 
.056 
.003 
.021 
.024 
.058 



.017 

.025. 

,025 

1006 

.034 

.008 

.008 

.016 

.031. 

.056 

.003 

.021 

.024 



. .02* 
.032 
.032 
.008 
.028 
.007. 
.005 
.015 
o.016 
4P32 
T003 
.029 
.025 
.081 



X>20 
.021 
.021 
.008 
.023 
.006 
.007 
.017 
.027 
.062 
.004 
.026 
.027 
.057/ 



Table E. Maximum proportional absolute differences* between the synthetic estimate produced by the total 50 a-cell grid and that 
produced by other ot-ce|l griefs for selected Health Interview Survey' (HIS) veriebles, 1969-71 



fane 



HIS variable 



a«ce% grid 



Sex 







Se,x 






by 


Age 


by 


color 


\ 


m 


by 






*ge m 


* * 







Sex 

by 

in- 
come 
by 

•ge 



Restricted activity days M 

Bed disability days k i ' .. r. 

Work loss days • ^ ^ 

Hospital discharges per tOO person years ♦ 

[ Average length of hospitalization ,.../. „, 

* Percent of persons haying one or more hospital episodes in a year ,-.. MV 1 

Percent of persons having one or more physician visits in a year .: „ 

Number of physician visits per person year „..y.... f 

Percent of persons having one or more dental visits in a' year ./T.~.. 

Number of dental visits per. person yut , f 

Percentnot limited in activity ~ , L M 

Percent limited in activity .'. , \ 

Percjent limited in amount ortdnd of major activity ;% JL\ 

Percent unefaHe to carry on major activity/ . ^ H ' 



Maximum proportional absolute 
difference 



.103 


.073 


.073 


. .099 

*.114 


.136 


.134 


.088 


.089 


.170 


a\e 


.047 


.140 


*71 


.211 


.127. 


*.065 


.070 


.034 


.024 


.188^ 


.201 


.203 


.083 


.266 


.103 


.057 


.073 


.026 


.020 


.042 


.041 


.042 


.019 


.027 


.104 


.084 


.088 


.050 


.048 


.221 


^5 


.237 


.058 


.241 


.382 


139* 


.398 


,101 


.'402 


.037 


.013 


.013 


.014 


.021 


.454 


.094 


.092 


.130 


.184 


!493 


.088 




.122 


.154 


*695 


.233 


.192 
• 1 


.264 


.494 



/ 



14 



.21 



r • 



Table F. Co^t^ co^c^t, bg^jht »ri^ sttjmat. product by *. '»V»M|rU anVtHat prdHuced by other ««e^ 
♦ ' for Heded Heelth interview Survty (HIS) variable*, 19&71 ° * 



HIS variable 



— • - : ; 

S2S^r~=z==3frT-^" - - 

Work lots dsys i : ^ , V r : ; 

Hoepital diecherges ptr 100 parson yean ZZZZ tV M " 

Average length of hospitalization iCl^"!!!!!!!!!"/ 7 ? ^ 

Perctrtt of person* havln^on* or more hospital, episodes in a y£'Z. '£? * 
Ptrcent of persons having ont or mort physician visits in a year 

Number of physician visits per person year - " T"'- 

Percent of persons having one or more dental visits in a year 

Number, of dental visits per person year.;..: * 

Percent not limited'in activity ' * * 

Percent limited in activity .: : J".!Z!!..Z 

Percent limited in amount or kind of major ■ctf!5^'""\\'l^^^!!!!!!!! * ^ 
Percent unabla to carry on rnajor activity 



t a-ceH,grid 



Sex 



Age 



Sex 

by. 



Sex 
by 
, cdlor 
by a 



Correlation coatf ic4ent^ 



* $e* 
by . 
in- 
come 

•9« 



.94 
.91 
.88 

.09 
30 
.81 
.93 
31 
.96 
.63 
.61 
.50 
.85 



.96 

~35 
:s9 

.89 
\98 

-8? 

30 
35 
..95 
35 
94. 
.94 



.96 

36 
.30 

.97 

.89 

.98 

.81 

.96 

30 
/$5 

.96 

,95 

.94 J 

s94 



.94 

3& 
.96 
.99 
.96 
39 
* 35 
.9T 
,.98 
39 
^.95 
*33 
* .94" 
.93 



.93 
..94 
.85* * 
.99 
.87 
.99 
.91 

98 t 

.92 

36 

32 

.93 

.93 ^ 



average proportional absolute differences are* 
small. 

The correlation coefficient between syn- 
thetic estimates based on -the detailed 50-cell 
grid and those based on a less detailed grid "(table 
F) measures the strength of the relationship be- 
tween the two sets of estimates. It is a measure 



that is particularly appropriate when ft is'<Jes{fed 
- to rank a set of estimates and .when the absolute 
• values, of the estimates are of secondary interest, 
bi general, the correlations were quite high, wjtfi 
estimates fcased on the sex grid showing the 
lowest correction* with those based on the de- 
tailed grid. * 
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APPEND!* 

PROOF OF LEMMAS AND THEOREMS ( 



Lemma- 1 : Jhc expectation E(X' S ) of the nearly 

unbiased estimator X' s is given by — 



Proof 

Wc note that the, expectation of X' s is jjive 



1 r 



/-I. 



(1A) 



where 



= average level of X in stratum ;', 



1-1 



Nf- the number of persons in stratum/, 



%sf. = II n sji^sj$l n sj. 



= average level of in that portion of 
stratum / which is in State s, 



and 



^i/f = ^ average level of characteristic X in 
that portion of State s which i? in PSU t 
of stratum;* * - 



by 

* *» * • X : i 

since X, is an unbiased estimator of 'JF . . 

. *' t 
QED 

Lemma 2 : The bias in the .nearly un- 

biased estimate can be expressed 
by 

a 

*$s) -X (*,. - (2A) . 

* « ► ■ 

or equivalently by . -mm 
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. Proof 



: • . • • - . 

_ ■ " . . . Lemmg_3: Let us assume that the rajio 

Since, by definition, the" bias BiT,) of T, is' '* • ' is " 1 * same for 311 PSU ' S in the same' 
equal to - '•*.• ' * ^ . : stratum in the same State, Which im- 



plies that 



where 1 - 



Then the bias in. the "nearly" unbiased 



; =UageWl of Yin State S ,relation # (5A) ^^'^ * C L ^ ^ ' ' 
follows directly f/pm iiemma 1 and the * 

r- definition of Jf. . 



We note that 



.-1 



(4A)' 



The results folkw directly from^ relation 
'(5A)aAdl4;pia2. ' ) • *' 



* Therefore, relation^ (3 A) follows from relation, 
(2A) by subsfitution of relation (4A) "into rela* 
-Uoh (2 A). 



<?QED 



QED 



Theorem 4: Jf = — f or t=tf . . . , 7 then 



&(X f s ), the' square of^fhe bias in # 
X' s , is give^^r the expression 



where 



Proof ? 
By addiag and subtracting to the right- 



hand side of relation (6A)'w6 obtain 

/ £ (^ : ^^y.-^v).^(8A) 



x Squaring the right-hand side of relation -(8A), we obtain 

k . v 9 , ♦ 

e • 
^ i • •• , i , . 

\ I 

; . * -EtfttZ, itfM^ftAiM +2E E-^ E <vM E (fc- 



V 



J 7 



;«1 k<j n s.. 1 sj i sk i«l ( n'-l' * 



^ftut the first tVrm iji equation 9 A 



by 



'Z^'A : t^%^*^^ X M ^ • ' -E Var(?;j^)'- 



»<*-'• • 

; Therefore, by* appropriate substitution ^sf rela- 
, tions (10A|4nto (9A), \tc obtain the form speci- 
fy by equation (7A). m * 



•.\ +2E cov(/>; ft i;,vp; a .^) (Ha) . 

reduces to •* . • ' 



QED 



+ 2 E t ^^«'Cov(^,^). (12A 



JJhcorcm ff : M the P* s * are independent of the 

T a9 then the variance of 5, given * for iarge valfles of n 



<x<a 
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Proof ' 

The second term In the righf-hand side of 
equation (11 A) "is giverrl^ 

^(^w^^w;o (i3A) 

since Pj a is independent of T a . 
But- . 

E ( p L ) = p sa (Ac tru * proportion of State j fall- 
ing into cdl a), and 



where ' 



(14A) 



n s - the sample size in State s used for estimat- 
ing the (e.g., for a State having 1 million 
f . persons and for the 1-Percent Public Use 
^Sample Tapes, n s = K 1 0,000). 



Therefore, 



, «" f^.^)=<?v(/»; a ,/»,V) + £:<p; fc >^; a , ) 
■-pp, ' 

•. n +P -s« P sa - 



* " =~^ P s a P sa - (15A) 

and for large n s 

(i6A) 

Therefore, from equations (13A) and (16A) 

t 

- JT« TO.)] -/»,. P ta . Cov(^; JJ.). (1-7A) 



Hansen, Hurwitz, anjd kadow 15 show th^tt 
Var(>; a ^)=/« a ^ , ; / 

x [va£& Var£ + 2C^£^J 



which reduces to 



Var(?J a ^)=/* a *2 



Tvar?^ ,Varj;1 



Since 



Gov «, 0 = 0 



arid this reduces further to a fcj^m given by 

VAFM&X* Var^+i* Var^ * 

= " ** K ^+P2 a 2 

"V 

Substituting equations (17A) and -(19A) into 
equation (11A), • 

°% * Z <> 2 + - F * 2 uu /> > p 



+2 E '«'„ co»(r;,x;.) 

Theorem 6 : Let JSC-* estimate a parameter 

with bias given by ^(^^and let 
be, an' unbiased estimated JT, 
which is' uncorrelated with X s . 
Then tht following relation is true 

. E(X' S -X) 2 ' =MSE?+o2 jf20A) 



NOTE: A list of references follows the text. 



where „ ¥ 

MSEf = the mean square error of a j$ 
and 

a 2 = Jthe variance of XL . 
Proof 

£(X;-^) 2 =£[(^-J f ) + A-^)] 2 

r 

=£(x;-x,) 2 +£(*,- f,) 2 

-2£(x;-^ f )(f f -j f ) 



but 



and 



\lso, it cart be shown thdt 

£(i;-^)(f f -^) = cov(^;,f,) / 

. since andji, are uncorrebted and B(X S ) = 0^ 
QED ( i 

Theor em 7 : 'If J?, is an estimate of-Y, with bias 
° - r given by if ^ is an un- 

biased estimate of 5f f uncorrected 
with Jf tf> and if a 2 is an unbiased 

estimate of a 2 , then the estimate 

x' 



MSEjf given by 



E(X,-X s f =MSEy. 



MSEf = (X', - X,f -62 ' (21A) 

is an unbiased estimate of MSEj£»« w 

Proofs \ 
Proof follows directly from theorem' 6. 

'qed 
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